Using an On-Line Dictionary to Find Rhyming Words and Pronunciations for Unknown Words
نویسندگان
چکیده
Humans know a great deal about relationships among words. This paper discusses relationships among word pronunciations. We describe a computer system which models human judgement of rhyme by assigning specific roles to the location of primary stress, the similarity of phonetic segments, and other factors. By using the model as an experimental tool, we expect to improve our understanding of rhyme. A related computer model will attempt to generate pronunciations for unknown words by analogy with those for known words. The analogical processes involve techniques for segmenting and matching word spellings, and for mapping spelling to sound in known words. As in the case of rhyme, the computer model will be an important tool for improving our understanding of these processes. Both models serve as the basis for functions in the WordSmith automated dictionary system. 1. I n t r o d u c t i o n This paper describes work undertaken in the develop= merit of WordSmith, an automated dictionary system being built by the Lexical Systems group at the IBM T. J. Watson Research Center. WordSmith allows the user to explore a multidimensional space of information about words. The system permits interaction with lexical databases through a set of programs that carry out functions such as displaying formatted entries from a standard dictionary and generating pronunciations for a word not found in the dictionary. WordSmith also shows the user words that are "close" to a given word along dimensions such as spelling (as in published dictionaries), meaning (as in thesauruses), and sound (as in rhyming dictionaries). Figure I shows a sample of the WordSmith user interface. The current word, urgency, labels the text box at the center of the screen. The box contains the output of the PRONUNC application applied to the current word: it shows the pronunciation of urgency and the mapping between the word's spelling and pronunciation. PRONUNC represents pronunciations in an alphabet derived from Webster's Seventh Collegiate Dictionary. In the pronunciation shown "*" represents the vowel schwa, and ">" marks the vowel in the syllable bearing primary stress. Spelling-to-pronunciation mappings will be described in Section 3. Three dimensions, displaying words that are neighbors of urgency, pass through the text box. Dimension one, extending from uriede to urinomerric, contains words from the PRONUNC data base which are close to urgency in alphabetical order. The second dimension (from somebody to company) shows words which are likely to rhyme with urgency. Dimension three (from 9udency to pruriency) is based on a reverse alphabetical ordering of words, and displays words whose spellings end similarly to urgency. The RHYME and REVERSE dimensions are discussed below.
منابع مشابه
Comparative objective and subjective evaluation of three data-driven techniques for proper name pronunciation
Automatic pronunciation of unknown words is a hard problem of great importance in speech technology. Proper names constitute an especially difficult class of words to pronounce because of their low frequency of occurrence and variable origin. In this paper, we compare three different data-driven approaches which use a dictionary of (known) proper names to infer pronunciations for unknown names,...
متن کاملبررسی ضبط و خوانش لغات هزوارش (زند و پازند) در برهان قاطع
Anjavi Shirazi was the first person to collect and record the middle Persian, especially Pahlavi language in a dictionary. Farhang-i Jahangiri (1005-1017 AH) has recorded a number of words of the Pahlavi language and writing. After Anjavi, these words have been recorded in other dictionaries, especially in the Borhan-e Qate. Some of these words are used in the language with the same pronunciati...
متن کاملThe Effect of Lexicon Composition in Pronunciation by Analogy
Pronunciation by analogy (PbA) is a data-driven approach to phonetic transcription that generates pronunciations for unknown words by exploiting the phonological knowledge implicit in the dictionary that provides the primary source of pronunciations. Unknown words typically include low-frequency ‘common’ words, proper names or neologisms that have not yet been listed in the lexicon. It is recei...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملEFL Translation Students' Perspective toward Using Bilingual Dictionary in Translation of Polysemous Words
This research presented the use of bilingual dictionary and addressed the EFL translation students' points of view on the use of bilingual dictionary in translating polysemous words (English to Persian). Moreo- ver, it aimed at finding the possible relationship between the effect of using bilingual dictionary by stu- dents in translating polysemous words and their achieved scores. In the study ...
متن کامل